AITopics | Cienfuegos Province

Collaborating Authors

Cienfuegos Province

MessIRve: A Large-Scale Spanish Information Retrieval Dataset

Valentini, Francisco, Cotik, Viviana, Furman, Damián, Bercovich, Ivan, Altszyler, Edgar, Pérez, Juan Manuel

arXiv.org Artificial IntelligenceSep-9-2024

Information retrieval (IR) is the task of finding relevant documents in response to a user query. Although Spanish is the second most spoken native language, current IR benchmarks lack Spanish data, hindering the development of information access tools for Spanish speakers. We introduce MessIRve, a large-scale Spanish IR dataset with around 730 thousand queries from Google's autocomplete API and relevant documents sourced from Wikipedia. MessIRve's queries reflect diverse Spanish-speaking regions, unlike other datasets that are translated from English or do not consider dialectal variations. The large size of the dataset allows it to cover a wide variety of topics, unlike smaller datasets. We provide a comprehensive description of the dataset, comparisons with existing datasets, and baseline evaluations of prominent IR models. Our contributions aim to advance Spanish IR research and improve information access for Spanish speakers.

dataset, messirve, query, (15 more...)

arXiv.org Artificial Intelligence

2409.05994

Country:

North America > United States > California > Santa Barbara County > Santa Barbara (0.14)
North America > Mexico (0.04)
South America > Colombia > Bogotá D.C. > Bogotá (0.04)
(34 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Deep Learning Driven Detection of Tsunami Related Internal GravityWaves: a path towards open-ocean natural hazards detection

Constantinou, Valentino, Ravanelli, Michela, Liu, Hamlin, Bortnik, Jacob

arXiv.org Artificial IntelligenceAug-8-2023

Tsunamis can trigger internal gravity waves (IGWs) in the ionosphere, perturbing the Total Electron Content (TEC) - referred to as Traveling Ionospheric Disturbances (TIDs) that are detectable through the Global Navigation Satellite System (GNSS). The GNSS are constellations of satellites providing signals from Earth orbit - Europe's Galileo, the United States' Global Positioning System (GPS), Russia's Global'naya Navigatsionnaya Sputnikovaya Sistema (GLONASS) and China's BeiDou. The real-time detection of TIDs provides an approach for tsunami detection, enhancing early warning systems by providing open-ocean coverage in geographic areas not serviceable by buoy-based warning systems. Large volumes of the GNSS data is leveraged by deep learning, which effectively handles complex non-linear relationships across thousands of data streams. We describe a framework leveraging slant total electron content (sTEC) from the VARION (Variometric Approach for Real-Time Ionosphere Observation) algorithm by Gramian Angular Difference Fields (from Computer Vision) and Convolutional Neural Networks (CNNs) to detect TIDs in near-real-time. Historical data from the 2010 Maule, 2011 Tohoku and the 2012 Haida-Gwaii earthquakes and tsunamis are used in model training, and the later-occurring 2015 Illapel earthquake and tsunami in Chile for out-of-sample model validation. Using the experimental framework described in the paper, we achieved a 91.7% F1 score. Source code is available at: https://github.com/vc1492a/tidd. Our work represents a new frontier in detecting tsunami-driven IGWs in open-ocean, dramatically improving the potential for natural hazards detection for coastal communities.

detection, earthquake, tsunami, (15 more...)

arXiv.org Artificial Intelligence

2308.04611

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.29)
North America > Canada > British Columbia > Haida Gwaii (0.26)
Europe > Russia (0.24)
(15 more...)

Genre: Research Report (0.50)

Industry: Energy (0.69)

Technology:

Information Technology > Geographic Information Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback